Speaker compensation with sine-log all-pass transforms
نویسندگان
چکیده
In recent work, we proposed the rational all-pass transform (RAPT) as the basis of a speaker adaptation scheme intended for use with a large vocabulary speech recognition system. It was shown that RAPT-based adaptation reduces to a linear transformation of cepstral means, much like the better known maximum likelihood linear regression (MLLR). In a set of speech recognition experiments conducted on the Switchboard Corpus, we obtained a word error rate (WER) of 37.9% using RAPT adaptation, a significant improvement over the 39.5% WER achieved with MLLR. In the present work, we propose the sine-log all-pass transform (SLAPT) as a replacement for the RAPT. Our findings indicate the SLAPT is just as effective as the RAPT at reducing WER when used as the basis for a variety of speaker compensation schemes, but in addition conduces to far more tractable computation of transformed cepstral sequences, and the estimation of optimal transform parameters.
منابع مشابه
Speaker Normalization with All-pass Transforms Center for Language and Speech Processing 72 Speaker Normalization with All-pass Transforms
Speaker normalization is a process in which the short-time features of speech from a given speaker are transformed so as to better match some speaker independent model. Vocal tract length normalization (VTLN) is a popular speaker normalization scheme wherein the frequency axis of the short-time spectrum associated with a particular speaker’s speech is rescaled or warped prior to the extraction ...
متن کاملSpeaker Identification using Frequency Dsitribution in the Transform Domain
In this paper, we propose Speaker Identification using the frequency distribution of various transforms like DFT (Discrete Fourier Transform), DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), Hartley, Walsh, Haar and Kekre transforms. The speech signal spoken by a particular speaker is converted into frequency domain by applying the different transform techniques. The distributio...
متن کاملSpeaker normalization with all-pass transforms
Speaker normalization is a process in which the short-time features of speech from a given speaker are transformed so as to better match some speaker independent model. Vocal tract length normalization (VTLN) is a popular speaker normalization scheme wherein the frequency axis of the short-time spectrum associated with a speaker's speech is rescaled or warped prior to the extraction of cepstral...
متن کاملSpeaker adaptation with all-pass transforms
In recent work, a class of transforms were proposed which achieve a remapping of the frequency axis much like conventional vocal tract length normalization. These mappings, known collectively as all-pass transforms (APT), were shown to produce substantial improvements in the performance of a large vocabulary speech recognition system when used to normalize incoming speech prior to recognition. ...
متن کاملFrequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC
Vocal Tract Length Normalization (VTLN) for standard filterbank-based Mel Frequency Cepstral Coefficient (MFCC) features is usually implemented by warping the center frequencies of the Mel filterbank, and the warping factor is estimated using the maximum likelihood score (MLS) criterion (Lee and Rose, 1998). A linear transform (LT) equivalent for frequency warping (FW) would enable more efficie...
متن کامل